Maximum Entropy Learning Model for Biomedical Semantic Type Induction

نویسنده

  • Ekaterina Shutova
چکیده

In the biomedical literature the gene names tend to be used to refer to biomedical entities other than genes. Therefore, when performing information extraction tasks it is necessary to distinguish between all such biomedical entities. Our approach consists of training a Maximum Entropy classifier for this task using a large automatically-created training corpus, as opposed to manually annotated data. In order to create the training corpus, a rulebased baseline tagger is developed using the information from the Sequence Ontology that classifies biomedical entities into seven biotypes. Subsequently, this tagger is applied to a large corpus of biomedical text in order to annotate it, thus generating the training data for the machine learning system. The results obtained show that the maximum entropy classifier trained on automatically created data performs better than the baseline tagger itself, which demonstrates the efficiency of the adopted approach. The developed machine learning techniques are domain-independent and can be applied to text mining in any field.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BIOSMILE: Adapting Semantic Role Labeling for Biomedical Verbs: An Exponential Model Coupled with Automatically Generated Template Features

In this paper, we construct a biomedical semantic role labeling (SRL) system that can be used to facilitate relation extraction. First, we construct a proposition bank on top of the popular biomedical GENIA treebank following the PropBank annotation scheme. We only annotate the predicate-argument structures (PAS’s) of thirty frequently used biomedical predicates and their corresponding argument...

متن کامل

Two-Phase Biomedical Named Entity Recognition Using A Hybrid Method

Biomedical named entity recognition (NER) is a difficult problem in biomedical information processing due to the widespread ambiguity of terms out of context and extensive lexical variations. This paper presents a two-phase biomedical NER consisting of term boundary detection and semantic labeling. By dividing the problem, we can adopt an effective model for each process. In our study, we use t...

متن کامل

BIOSMILE: Adapting Semantic Role Labeling for Biomedical Verbs

In this paper, we construct a biomedical semantic role labeling (SRL) system that can be used to facilitate relation extraction. First, we construct a proposition bank on top of the popular biomedical GENIA treebank following the PropBank annotation scheme. We only annotate the predicate-argument structures (PAS’s) of thirty frequently used biomedical predicates and their corresponding argument...

متن کامل

A Hybrid Approach to Biomedical Named Entity Recognition and Semantic Role Labeling

In this paper, we describe our hybrid approach to two key NLP technologies: biomedical named entity recognition (Bio-NER) and (Bio-SRL). In Bio-NER, our system successfully integrates linguistic features into the CRF framework. In addition, we employ web lexicons and template-based post-processing to further boost its performance. Through these broad linguistic features and the nature of CRF, o...

متن کامل

Discriminative Learning of Syntactic and Semantic Dependencies

A Maximum Entropy Model based system for discriminative learning of syntactic and semantic dependencies submitted to the CoNLL-2008 shared task (Surdeanu, et al., 2008) is presented in this paper. The system converts the dependency learning task to classification issues and reconstructs the dependent relations based on classification results. Finally F1 scores of 86.69, 69.95 and 78.35 are obta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008